G-DBSCAN: A GPU Accelerated Algorithm for Density-based Clustering
نویسندگان
چکیده
With the advent of Web 2.0, we see a new and differentiated scenario: there is more data than that can be effectively analyzed. Organizing this data has become one of the biggest problems in Computer Science. Many algorithms have been proposed for this purpose, highlighting those related to the Data Mining area, specifically the clustering algorithms. However, these algorithms are still a computational challenge because of the volume of data that needs to be processed. We found in the literature some proposals to make these algorithms feasible, and, recently, those related to parallelization on graphics processing units (GPUs) have presented good results. In this work we present the G-DBSCAN, a GPU parallel version of one of the most widely used clustering algorithms, the DBSCAN. Although there are other parallel versions of this algorithm, our technique distinguishes itself by the simplicity with which the data are indexed, using graphs, allowing various parallelization opportunities to be explored. In our evaluation we show that the G-DBSCAN using GPU, can be over 100x faster than its sequential version using CPU.
منابع مشابه
بررسی مشکلات الگوریتم خوشه بندی DBSCAN و مروری بر بهبودهای ارائهشده برای آن
Clustering is an important knowledge discovery technique in the database. Density-based clustering algorithms are one of the main methods for clustering in data mining. These algorithms have some special features including being independent from the shape of the clusters, highly understandable and ease of use. DBSCAN is a base algorithm for density-based clustering algorithms. DBSCAN is able to...
متن کاملImprovement of density-based clustering algorithm using modifying the density definitions and input parameter
Clustering is one of the main tasks in data mining, which means grouping similar samples. In general, there is a wide variety of clustering algorithms. One of these categories is density-based clustering. Various algorithms have been proposed for this method; one of the most widely used algorithms called DBSCAN. DBSCAN can identify clusters of different shapes in the dataset and automatically i...
متن کاملMRG-DBSCAN: An Improved DBSCAN Clustering Method Based on Map Reduce and Grid
DBSCAN is a density-based clustering algorithm. This algorithm clusters data of high density. The traditional DBSCAN clustering algorithm in finding the core object, will use this object as the center core, extends outwards continuously. At this point, the core objects growing, unprocessed objects are retained in memory, which will occupy a lot of memory and I/O overhead, algorithm efficiency i...
متن کاملA density based clustering approach to distinguish between web robot and human requests to a web server
Today world's dependence on the Internet and the emerging of Web 2.0 applications is significantly increasing the requirement of web robots crawling the sites to support services and technologies. Regardless of the advantages of robots, they may occupy the bandwidth and reduce the performance of web servers. Despite a variety of researches, there is no accurate method for classifying huge data ...
متن کاملMr. Scan: A Hybrid/Hybrid Extreme Scale Density Based Clustering Algorithm
Density-based clustering algorithms are a widely-used class of data mining techniques that can find irregularly shaped clusters and cluster data without prior knowledge of the number of clusters the data contains. DBSCAN is the most well-known density-based clustering algorithm. We introduce our extension of DBSCAN, called Mr. Scan, which uses a hybrid/hybrid parallel implementation that combin...
متن کامل